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ON OPTIMALITY OF THE SHIRYAEV-ROBERTS PROCEDURE FOR 
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In 1985, for detecting a change in distribution PoUak introduced 
a specific minimazx performance metric and a randomized version 
of tlie Shiryaev-Roberts procedure where the zero initial condition 
is replaced by a random variable sampled from the quasi-stationary 
distribution of the Shiryaev-Roberts statistic. Pollak proved that this 
procedure is third-order asymptotically optimal as the mean time to 
^ , false alarm becomes large. The question whether Pollak's procedure 

is strictly minimax for any false alarm rate has been open for more 
than two decades, and there were several attempts to prove this strict 
, optimality. In this paper, we provide a counterexample which shows 

that Pollak's procedure is not optimal and that there is a strictly op- 
timal procedure which is nothing but the Shiryaev-Roberts procedure 
. that starts with a specially designed deterministic point. 

' 1. Introduction and preliminaries. Changepoint problems deal with detecting a change in 

the distribution of observed data that occur at unknown points in time. Let Xi,X2, ... be the series 
of observations being monitored, and let i' be the serial number of the last pre-change observation, 
so that Xjy+i is the first post-change observation. Let and denote probability and expectation 
^ ■ when the change occurs at -|- 1 for a fixed ^ < oo, and let Poo and Eqo denote the same when 

. 1^ = 00 (i.e., there never is a change). A sequential change detection procedure is a stopping time 

T adapted to the observations Xi,X2, ■ ■ ■ , i.e., {T ^ n} £ where ^„ = ct(Xi, . . . , A„) is the 
^f) • sigma-algebra generated by the first n observations. 

. Common operating characteristics of a sequential detection procedure are the Average Run 

I Length (ARL) to False Alarm, i.e., the expected number of observations to an alarm assuming that 

there is no change, and the Average Delay to Detection, i.e., the expected delay between a change 
and its detection. The goal is to find a detection procedure that minimizes the average detection 
delay subject to a bound on the ARL to false alarm. 
^ ' In this paper, we will be interested in the simple changepoint problem setting, where the obser- 

■ vations are independent, i.i.d. pre-change with density foo and i.i.d. post-change with density /q. 

In other words, it is assumed that A„ has density foo for n ^ u and density /o for n > u, where 
both foo and /o are known but the changepoint z/ is unknown. Therefore, the conditional density 
of the sample (Ai, . . . , A„) for the fixed changepoint is 

k n 

p{Xi,...,XnW = k) = l[fooix,)x n MXk), 

i=l i=k+l 

where YVi=j fo{Xj) = 1 when j > n. 
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In 1961, for detecting a change in the drift of a Brownian motion, Shiryaev introduced a change 
detection procedure, which is now usually referred to as the Shiryaev-Roberts (SR) procedure 
(Shiryaev, 1961, 1963 and Roberts, 1966). The SR procedure calls for stopping and raising an 
alarm at 

(1) Tsr{A) = inf {n ^ 1 : Rn > A} , inf{0} = oo, 

where 

„ ^ p{Xi,...,Xn\iy = k) ^ A k{Xi) 

is the SR statistic and yl > is a threshold that controls the false alarm rate. 

This procedure has a number of interesting optimality properties. In particular, A = A^ is 
such that Eoo[7sr(^7)] = 7) then it minimizes the integral average detection delay 



EooT 

over all stopping times T that satisfy 

(3) EooT ^ 7, 

where 7 > 1 is a value set before the surveillance begins (cf. Pollak and Tartakovsky, 2009 and also 
Feinberg and Shiryaev, 2006 for the Brownian motion model). 
Note that the SR statistic (2) can be written recursively as 

(4) Rn = {I + Rn-i)K, n^l, Ro = 0, 

where An = fo{Xn) / foo{Xn) is the likelihood ratio. Therefore, the classical SR statistic starts from 
0. 

Pollak (1985) introduced a natural worst-case detection delay measure - supremum average delay 
to detection 

Jp{T)= sup E^{T-u\T>iy), 

0^i^<oo 

and attempted to find an optimal procedure that would minimize Jp{T) over all procedures subject 
to constraint (3). Pollak's idea was to modify the SR statistic by randomization of the initial 
condition Rq in (4) in order to make it an equalizer (i.e., to make the conditional average detection 
delay E,^{T — u\T > v) independent of the changepoint v). Pollak's version of the SR procedure 
starts from a random point sampled from the quasi-stationary distribution of the SR statistic Rn- 
He proved that this randomized procedure is asymptotically (as 7 ^ 00) optimal within an additive 
term of order o(l) in the sense of minimizing the supremum average detection delay j7p(T). 
To be specific, let, for i3 > 0, 

Qb(x) = lim Poo{Rn ^ x\%,{B) > n) 

n — >oo 

denote the quasi-stationary distribution of the SR statistic and let iZ^^ be given recursively 

(5) ijQs = (l + i?^fi)A„, n^l, i?o^^-QB, 
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where Rq ~ Qb means that Rq is a random variable distributed according to the quasi-stationary 
distribution Q^. The corresponding stopping time is given by 

(6) Tsrp(-B) = inf {n ^ 1 : iJ^^ ^ , inf{0} = oo. 
Pollak (1985) proved that B = B-y is selected so that Eoq[Tstp{B^)] = 7, then 

(7) MT,,piB^))- inf Jp(T)=o(l) as 7 ^ 00, 

{r:EooT^7} 

where o(l) ^ as 7 ^ 00. We will call this asymptotic optimality property third-order asymptotic 
optimality as opposed to the second-order optimality when the corresponding difference is bounded 
(i.e., 0(1)) and the first-order optimality when the ratio of the corresponding values tends to 1. 
Therefore, the procedure given by (5) and (6), which we will refer to as the Shiryaev-Roberts- 
Pollak (SRP) procedure, is third-order asymptotically optimal for the low false alarm rate. Note 
that this result is extremely strong since the difference between the average detection delays in (7) 
is asymptotically small while each of them is of order 0(log7) (i.e., both terms go to infinity). It 
can be also deduced from Pollak (1985, 1987) that the conventional SR procedure is asymptotically 
minimax for a low false alarm rate within an additive term of order 0(1), i.e., it is only second-order 
asymptotically optimal. 

Since the SRP procedure is an equalizer, i.e., Jp{TsTp) = EoTsrp = E^(rsrp — i^lTsrp > i') for 
all v ^ 0, it is tempting for one to conjecture that it may in fact be strictly optimal for every 
7 > 1. However, to date there is no proof or disproof of this conjecture (see Yakir, 1997 and 
Mei, 2006). Recent work of Moustakides, Polunchenko, and Tartakovsky (2010) indicates that the 
SRP procedure may not be exactly optimal and partially sheds light on this issue by considering 
a generalization of the SR procedure that starts from a specially designed deterministic point r. 
To emphasize the dependence on the starting point, this procedure will be referred to as the SR-r 
procedure. Specifically, define the stopping time 

(8) r;,(A) = inf {n ^ 1 : R^^ ^ A} , inf {0} = 00, 
where R^ obeys the recursion 

(9) = (1 + K_i)A„, n^l, i?5 = r^0. 

Solving numerically integral equations for performance metrics for two examples that involve Gaus- 
sian and exponential models, Moustakides, Polunchenko, and Tartakovsky (2010) found that the 
SR-r procedure (with a certain r = r^ that depends on 7) uniformly outperforms the SRP proce- 
dure, i.e., Ejy(TJj, — I'lT^T- > u) < EoTgrp for all ^ 0. We believe that these results present serious 
evidence against optimality of the SRP procedure. However, this may not be completely convincing 
since a small numerical error can be present in such experiments. 

In the present paper, we construct a counterexample where all computations can be performed 
analytically. This example proves that the SRP procedure is not optimal while the SR-r procedure 
with a deterministic initialization is optimal. This result answers a long standing question on 
optimality of the SRP procedure and opens a new direction in the quest for the unknown optimum. 

2. The main theorem and integral equations for operating characteristics. Theorem 1 
below provides a lower bound for the infimum of Pollak's worst-case measure J^p{T) which will be 
used to find the optimal changepoint detection procedure in Section 3. Note that a proof sketch of 
this lower bound has been previously given in Moustakides, Polunchenko, and Tartakovsky (2010). 
Here we provide a complete proof. 

We first need the following lemma which establishes optimality of the SR-r procedure with 
respect to the integral average detection delay. 
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Lemma 1. Let 



and let T]^^{Ar^) he the SR-r detection procedure with ^^[r]^^{Ar^)] = 7. For any r ^ 0, the SR-r 
procedure minimizes Ir{T) over all procedures with ^ooT ^ 7, i.e., 

(11) inf Tr{T)=Tr.{T:M,))- 

Proof. The proof of (11) for r = is given in Pollak and Tartakovsky (2009, Theorem 1 and 
Corohary 1). We now give its extension for an arbitrary positive r. 

Consider the following Bayesian problem, which will be denoted by B{-i:,p,c). Suppose v is a, 
random variable (independent of the observations) with a zero modified geometric distribution 

P{v < 0) = vr, P{v = k) = (1 - 7r)p(l - p)^, k ^ 0, 

and the losses associated with stopping at time T are 1 if T ^ i/ and c ■ {T — ly) T > ly, where 
0^7r<l,0<p<l and c > are fixed constants. For A S define the probability 

00 

P(^) = 7rPo(^) + (1 - ^) ^ p(l - p)''PkiA) 

k=0 

and let E denote the corresponding expectation. 

Solving B{tt,p,c) requires minimization of the expected loss 

or, equivalently, maximization of the expected "gain" ^[1 — ^■K,p,ciT)], and the Bayes rule for this 
problem is given by the Shiryaev procedure (cf. Shiryaev, 1963) 

T^,p,, = inf {7O 1 : Rt""'^ ^ ^.,p,c} , 

where ^7r,p,c > ^/(l ~ '^)P is an appropriate threshold and 

'^-'^(^^^(T^)^t^(T^ 



Let vr = rp. Then, obviously, R^'^^ > i?^. 

Now, it follows from Pollak (1985) that there are a constant < c* < 00 and a sequence {pi, Cjjj^i 
with Pi — > 0, Cj ^ c* as i — > 00, such that T^j.{A^) is the limit of the Bayes stopping times T-,^=rpi,pi,ci 
as i — > 00 and 

(12) hmsup = 1. 

Next, for any stopping time T, 

^^^^ = ^±li^Eor + 1^ fpd -p)^E,(r - kr 



P P P 



00 



k=l 



[r + (1 - rp)]EoT + (1 - rp) ^(1 - pf^^iT - k)' 

00 

Eor+^Efc(T-fc)' 
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and 



^^^L^ = i ("^ + (1 _ + (1 _ ^) p(i _ p)fcp,(T > k)] 



r 



P P k=i 



+ ^Poo(r> A;) = r + Eoor, 



where we used the fact that Pk{T > k) = Poq{T > k) since by the definition of the stopping time 
the event {T ^ k} belongs to the c-algebra ,!^k and at time instant k the distribution is stih foe- 
Since 

- 1 - LTr,p,c{T)\ = c , 

p p p 

it follows that if vr = rp, then for any stopping time T with E^oT < oo 

-[1 - £.=.p,p,e(r)] — > (r + EooT) - c ( rEoT + V Efc(r - A;)+ ) , 

which together with (12) establishes that the SR-r procedure minimizes 2r (T) over all stopping 
times that satisfy E^oT = 7. In order to prove that (11) holds in the class {T : Eoo^ ^ 7} it suffices 
to apply the argument identical to that used in the proof of Corollary 1 in Pollak and Tartakovsky 
(2009). □ 

Theorem 1. Let T^'j.^A) be defined as in (8) and let A = he selected so that Ecxj^T^A^i)] — 7- 
Then for every r ^ 

^''^ {T:e5,.}^^(^) ^ r + E^[TriA,)] • 

Proof. Note first that for any stopping time T 

00 00 

^ E,(r -„)+ = J2 PAT > iy)EAT - v\T > v) 

00 



= Y.Poo{T>i^)EAT-u\T>i^), 

i/=0 

where again we used the fact that Pi/(T > ly) = PooiT > v). Since 

Jp(r) = sup Efc(r - k\T >k)^ E^{T - u\T > u) for any ^ 

and 

J (r^. ^ Jp{T)[r + j:r=oPoo(.T>u)] 

_ rJp(T) +E^^o Jp(r)Poo(r > v) 

where Yli'vLo Poo{T > v) = E^T, we obtain that for any stopping time T with finite ARL to false 
alarm 

rEor + E^=o EAT-v\T>u) Poo{T > v) 



Jp{T) > 



r + E^T 

rEor + E^=o EAT-u^ 
r + EooT 
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Therefore, 

(14) inf Jp(r) ^ inf j,(r), 

{T:EooT>7} {T:E^T^-f} 

where Ir{T) is defined in (10). 

By Lemma 1, the infimum on the right-hand side in (14) is attained for the SR-r detection 
procedure T^^{A^), which completes the proof. □ 

Notice that if r can be chosen so that the SR-r procedure becomes an equahzer (i.e., EoTg*^ = 
Eiy{T^^. — I'lT^j. > v) for ^ 0), then it is optimal since the right-hand side in (13) is equal to EoTg'^j. 
which in turn is equal to sup^^g Eu{T^r ~ ^l^sr > t^) = J'p{T^r)- This observation will be used in 
Section 3 for proving that the SR-r procedure with a specially designed r = r^ is strictly optimal 
for an exponential model. 

Introduce the following notation: 

5,{r) = £,{Tl^-v)+- p,{r) = P^{Tl^>v), ^ 0; 

oo 

0(r) = Eoorj;; ^(r) = ^E,(rj;-^)+, 

where, obviously, po(^s'r) = 1 ^'^'^ '^o(?') = Eo^Ir. 

In the rest of the paper we will assume for simplicity that Ai is continuous. For i = 0,oo, let 
Fi{x) = Pj(Ai ^ x) denote the distribution functions of the likelihood ratio under the change and 
no-change hypotheses. 

Moustakides, Polunchenko, and Tartakovsky (2010, 2009) used the Markov property of the SR-r 
statistic (9) to obtain the following integral equations for performance metrics 

(15) 0(r) = l + /%(x)|:F.(3-^)dx 

(16) 5,{r) = 1 + j^^o{x)§-^F, dx 

(17) 5u{r) = ^^'^--i(^) ^^oo (y^) dx, u^l 

(18) py{r) = /^/9^-i(x) ^Foo {-^\ dx, 



dx V 1 -|- r 

(19) iPir) = 5o{r) + T ^{r) -^F^ ( ] dx. 



dx \1 + r , 

The conditional average delay to detection of the SR-r procedure is computed as 

E.(rj;-z.|rj;>z.) = ^, u^o 

and the lower bound as 



r + (p[r) 

Next, we present integral equations for the operating characteristics of the randomized SRP 
procedure (5), (6). Here the most crucial problem is the computation of the quasi-stationary dis- 
tribution Qb{x) of the SR statistic. By Harris (1963, Theorem HI. 10.1), in the continuous case the 
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quasi-stationary distribution exists. Its density qsix) = dQB{x)/dx satisfies the following integral 
equation 

(20) Xb qsix) = r qB{r)^F^ [t^] dr 







dx V 1 + r 



(see Pollak, 1985), where is the leading eigenvalue of the linear operator associated with the 
kernel 

K^{x,r) = ^Foo[j^ , X, re [0,5). 
Thus, qsix) is the corresponding (left) eigenfunction. It also satisfies the constraint 

rB 

(21) / qB{x)dx = l. 



Equations (20) and (21) uniquely define Xb and qB{x)- The equations have unique solutions, since 
< 1, as follows from Moustakides, Polunchenko, and Tartakovsky (2010). 

Once qB{x) is available we can compute the ARL to false alarm and the average detection delay 
of the SRP procedure Tgrp: 

rB 

(22) Eoo[rsrp(5)] = / (t){r)qB{r)dr 

Jo 

(23) Eo[rsrp(5)] = / 6o{r)qB{r)dr. 

Jo 

We recall that the SRP procedure is an equalizer: Ejy(Tsrp — i^\Tsrp > i') = EoTgrp. 

The integral equations derived above are Fredholm equations of the second kind. Usually, they 
do not allow for an analytical solution and should be solved numerically. However, in the next 
section, we provide an example where analytical solutions can be obtained. 

3. An example. Consider the exponential model with the pre-change mean 1 and the post- 
change mean 6*"^, ^ > 1, i.e., foo{x) = e~^]l|j,^o} ^^id fo{x) = 6'e~^^l|j,j>o}. We will call this model 
the f (1, ^)-model. In the sequel we will assume that 6 = 2 and the thresholds in both procedures 
SR-r and SRP do not exceed 2. 

Theorem 2. Assume the £ (1,2) -model. Let in the SR-r procedure T^'/- the initializing value 
be chosen as rA = \/l + A — 1 and let the threshold A = A^ be selected from the transcendental 
equation 



(24) A + (7 - 1) Vl + A log(l + A)- 2(7 - 1) Vl + ^ = 0. 

Then, for every 1 < 7 < 70, where 70 = (1 — 0.5 log 3)^^ « 2.2188, the ARL to false alarm 
EooKr'^(^)] = 7 o.TT'd the SR-r procedure is minimax, i.e., 

(25) Jp(T;-) = inf jp(r). 

{T:EooT>7| 

Let in the SRP procedure the threshold B = By be selected as 

(26) B = e.p[^t^ 

Then Eoo[Tsrp(B)] = 7 and Jp{Tsrp{B)) > Jp(TJ/(A)) for a// 1 < 7 < 70. Therefore, the SRP 
procedure is suboptimal. 
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Proof. Consider first the SRP procedure. As it will become apparent later threshold B = By 
in this procedure does not exceed 2 when 7 < 7o- By (20), for B < 2 the quasi-stationary density 
qsix) = dQB{x)/dx satisfies the integral equation 



Ab qB{x) 



1 , , dr 

qB{r) 



2 Jo 



1 + r' 



which due to the constraint (21) yields \b = ^log(l + B) and qsix) = B~^t^^^[QB)}- Thus, for 
B < 2 the quasi-stationary distribution Qb{x) = x/B is uniform and, moreover, it is attained 
already for n = 1 when the very first observation becomes available. 

Clearly, the Poo-distribution of the SRP stopping time Tsrp is geometric with the parameter 
1 — As, so that the ARL to false alarm is 



(27) 



■-oo 



1 



1 



1-Xb l-ilog(l + 5) 



It follows that Eoo[TsTpiB)] = 7 when the threshold B = B-y is chosen as in (26) and that B < 2 
whenever 7 < 70- 

By (23), the average detection delay of the SRP procedure is equal to 



(28) 



Eo[rsrp(i?)] 



1 

B 



So{r)dr, 



so that we need to compute the ARL to detection 5o{r) = Eo?Jj, of the SR-r procedure which also 
has to be computed for the evaluation of the performance of the SR-r procedure itself. 
Assume that A < 2. By (16), we have 



5o{r) 



1 + 



1 



2(l + r)2 Jo 



5o{x) X dx, 



so that 



do (r) r dr 



r dr ^ — 
2 





1 

T ^ 2 



log(l + A) 



x dx 

A 



1 + A 



So{r) r dr 
A 

5o{r) r dr 







which implies that 

Consequently, 
(29) 



r(5o(r) dr = A? 



A 



l + A 



+ 2 l--log(l + A) 



5o(r) 



1 



A^ 



(30) 



2{l + rY 
Using (28) and (29), we find 

Eo[Tsrv{B)] = ~5o{B) 

1 + 



A 



l + A 



+ 2(1-^ log(l + A) 



B^ 



2(1 + B) 



B 



l + B 



+ 2(l-^log(l + S) 
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Consider now the SR-r procedure. By (15), for the ARL to false alarm (f)(r) 
have 



E^[T^M)] we 



= 1 + 



2(1 + r) 



(x) dx. 



so that 

and therefore 

Consequently, 
(31) 



(j){r) dr 



ar + - 



0(r) dr = A 



1 



dr 
1 + r 



log(l + A) 



4)[x) dx 



-1 



1 + 



A 



2 1+r 



1 



log(l + A) 



Recall that for A <2 the statistic iij^ already kicks in the uniform quasi-stationary distribution 
for n = 1 and any ^ r < A, so that T^^ is an equalizer for v 1 and any r £ [0,^), i.e., 
^v{f) = '^o(^) for all ^ 1 and r < A with (5o(^) given by (30). This implies that 



(32) 



sup E^(T^ 

u>0 



max{5o{A),6o{r)} 



Let r = rA = Vl + ^ — 1, in which case 5o(^) = ^o{^a), i-e., for this value of the head start 
the SR-r procedure is an equalizer for all z/ ^ 0. Therefore, by Theorem 1 the procedure T^/^ that 
starts from the deterministic point rA = \/l + A — 1 is optimal, and (25) holds if threshold A = A-y 
is selected so that EooTJ/' = 7. Substituting r = \/l + A — 1 in (31) and equalizing the result to 7, 
yields transcendental equation (24). It is easily verified that A^ < 2 for 7 < 70. This completes the 
proof of optimality of the SR-r procedure for all 1 < 7 < 70. 

In order to show that for every given 7 G (1, 70) the SRP procedure is inferior it suffices to show 
that E(x,[rg'/^(A)] > Eoo[7si-p(A)]. By (31), the ARL to false alarm of the SR-r procedure is equal to 



(33) 



Eoo[T;/(A)] = <A(rA) = l + 



A 



1 



l^og{l + A) 



2y/ATT 

Comparing (33) with (27), we obtain that we have only to show that 



1 + 



A 



2^/JTT 



1 - l log(l + A) 



1 -1 



> 



1- ^log(l + ^) 



1 -1 



i.e., that Aj^ A + 1 > log(j4 + 1), which holds for any A > 0. Thus, we conclude that the SRP 
procedure is suboptimal and the proof is complete. □ 

Let, for example, 7 = 2. Then, by (26) and (30), the threshold in the SRP procedure is equal to 
5 = e - 1 PS 1.71828 and the average detection delay Eo[rsrp(B)] = Jp(rsrp(-B)) w 1.33275. 

For 7 = 2, solving the transcendental equation (24) yields A ~ 1.66485 and the initialization 
point rA ~ 0.63244. By (32), the average detection delay of the SR-r procedure Eo[TJ/(^)] = 
Jp(rj;-4(^)) « 1.31622. 

Figure 1 depicts the supremum average detection delays versus the ARL to false alarm for the 
two changepoint detection procedures for the entire range of A G (0, 2). 



Remark. At an additional effort, the same conclusion can be reached in the more general case 
where the parameter of the post-change distribution d > \ and A,B<6. 



10 



A. S. POLUNCHENKO AND A. G. TARTAKOVSKY 




Fig 1. Supremum average detection delay versus the ALR to false alarm for A £ (0, 2). 
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